
Cristiano Ronaldo dos Santos Aveiro
He currently holds the record for the most number of goals scored in the world.
Current team: Portugal national football team (#7 / Forward) Trending
Born: February 5, 1985 (age 37 years), Hospital Dr. Nélio Mendonça, Funchal, Portugal
Manchester United
The Premier League:
The Premier League is the top tier of England's football pyramid, with 20 teams battling it out for the honour of being crowned English champions.
It is also the most-watched league on the planet with one billion homes watching the action in 188 countries. A home to some of the most famous clubs, players, managers and stadiums in world football, one of them being Manchester United.
Football
Football is a two-team sport with a maximum of 11 players on each squad. Each game consists of two 45-minute halves separated by a 15 minutes break. A football match, however, does not finish in 90 minutes. Due to substitution, injuries, and disciplinary actions, it occasionally lasts longer.
One of the most thrilling positions in football is forward. The goal of the forward is to score and to put pressure on their opponents to make errors. In addition, great forward players need good dribbling, shooting and heading skills.
#Library for data analysis and processing
import numpy as np
import pandas as pd
import datetime as dt
import seaborn as sns
import regex as re
import matplotlib.pyplot as plt
from matplotlib.cbook import get_sample_data
from matplotlib.offsetbox import (OffsetImage, AnnotationBbox)
import os
#Plotly Library to make graphs
import plotly.io as pio
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px
#Library to be used for reading json file
import json
#Bokeh Library to make graphs
import bokeh
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import DataTable, DateFormatter, TableColumn
#Library to display images
from IPython.display import Image
#Use the output_notebook() function to display Bokeh plots in Jupyter notebook
output_notebook()
#Library to ignore warnings
import warnings
warnings.filterwarnings('ignore')
https://www.kaggle.com/datasets/evangower/premier-league-matches-19922022
This dataset acquired from Kaggle includes every game ever played in the English Premier League, starting in 1992 and continuing through the last week of the 2021–2022 season. Each season lasts for 1 year, and a total of 20 teams compete in the competition. It gives us the details about each match:
https://en.wikipedia.org/wiki/List_of_Manchester_United_F.C._seasons
This webpage lists the details of the Manchester United club's achievements in major competitions for all years from its inception. We will use information in the 'Results of league and cup competitions by season' table to identify in which seasons the club has won the premier league title.
https://www.nature.com/articles/s41597-019-0247-7
Public dataset that contains spatio-temporal match events that had occurred during football match.Each of the match event consists of information about its position, time, outcome, player and characteristics. We downloaded a JSON file which was read converted to a pandas dataframe for analysis.
https://www.kaggle.com/datasets/azminetoushikwasi/cr7-cristiano-ronaldo-all-club-goals-stats
This dataset acquired from Kaggle consists of the complete list of all club goals of Ronaldo. CSV file was used for processing and analysis.
As mentioned above we have used 6 datasets in different file formats- csv, json. We have also collected data from publicly available website - Wikipedia. We followed a series of steps to process the data and get a Dataframe ready after pre-processing.
Manchester United in Last 20 years - We are analyzing how they maintained consistency by winning 3 times in a row. We have analyzed "Season" and "Standing" columns for this purpose.
Home ground Analysis at Manchester United - Here we have analyzed the winning probability of Manchester United on home ground when Ronaldo was playing at the club v/s when Ronaldo was not playing with the club.
Ronaldo Goals Scoring patterns - For La liga dataset we merged data from 2 different dataframes and analyzed data to find patterns from which position did Ronaldo attempted a goal and when was it successful. The analysis included
# Importing the Manchester United season's data from wikipedia
man_united_seasons = pd.read_html('https://en.wikipedia.org/wiki/List_of_Manchester_United_F.C._seasons#Seasons', header=0,match='Season')[0]
# Display the dataframe
man_united_seasons.head()
| Season | League | League.1 | League.2 | League.3 | League.4 | League.5 | League.6 | League.7 | League.8 | League.9 | FA Cup | EFL Cup | CommunityShield | UEFAFIFA | Top goalscorer(s)[a] | Top goalscorer(s)[a].1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Season | Division | Tier | Pld | W | D | L | GF | GA | Pts | Pos | FA Cup | EFL Cup | CommunityShield | UEFAFIFA | Name(s) | Goals |
| 1 | 1886–87[b] | — | NaN | — | — | — | — | — | — | — | — | R1 | NaN | NaN | NaN | Jack Doughty | 4 |
| 2 | 1888–89[c] | Combination | NaN | 12 | 8 | 2 | 2 | 27 | 13 | 18 | — | — | NaN | NaN | NaN | Jack DoughtyRoger Doughty | 6 |
| 3 | 1889–90 | Alliance | NaN | 22 | 9 | 2 | 11 | 40 | 45 | 20 | 8th | R1 | NaN | NaN | NaN | Willie Stewart | 10 |
| 4 | 1890–91 | Alliance | NaN | 22 | 7 | 3 | 12 | 37 | 55 | 17 | 9th | QR2 | NaN | NaN | NaN | Bob Ramsay | 7 |
From this dataframe we need only two columns "Season" and "Pos"
Let us rename Pos column to Standings as the values denote where the club is placed in rankings in that particular season.
# Making the 1st row as the header as current header is not required
man_united_seasons.columns = man_united_seasons.iloc[0]
# Drop all the columns except Season and Pos and first row
man_united_seasons= man_united_seasons[['Season','Pos']]
man_united_seasons = man_united_seasons.iloc[1: , :]
# Renaming the Pos column to Standing
man_united_seasons.columns = ['Season','Standing']
# Check for duplicates in the dataset
duplicates = man_united_seasons.duplicated().sum()
print(f'Number of duplicates in the dataset are: {duplicates}')
# Identifying the null values
null_values = man_united_seasons.isnull().sum().sum()
print(f'Number of null values in the dataset are: {null_values}')
# Display the dataframe
man_united_seasons.head()
Number of duplicates in the dataset are: 0 Number of null values in the dataset are: 0
| Season | Standing | |
|---|---|---|
| 1 | 1886–87[b] | — |
| 2 | 1888–89[c] | — |
| 3 | 1889–90 | 8th |
| 4 | 1890–91 | 9th |
| 5 | 1891–92 | 2nd[d] |
Here, we checked for duplicated and values in the cleaned dataframe . When we take the sum of the booleans using duplicated and isnull functions, it was found that there are no duplicates/nulls. Let us proceed with analysis:
# Update Season column with four digits representing the year
pattern2 = '(\d{4})'
man_united_seasons['Season'] = man_united_seasons['Season'].str.extract(pattern2)
man_united_seasons["Season"]= man_united_seasons["Season"].str.split("-", expand = True)
# Update Standings column with values of two digits
# Note: We will consider 2000-2001 season as the year 2001.
pattern1 = '(\d{1,2})'
man_united_seasons['Standing'] = man_united_seasons['Standing'].str.extract(pattern1)
# Drop nulls if present
man_united_seasons = man_united_seasons.dropna()
# Convert the column values to integer
man_united_seasons['Season'] = man_united_seasons['Season'].astype(int)
man_united_seasons['Standing'] = man_united_seasons['Standing'].astype(int)
Let's find out Manchester United's football club standings over recent years:
# Taking only the data of Manchester United from year 2000-2022 for analysis
man_united_seasons = man_united_seasons.loc[man_united_seasons['Season']>2000]
# Reset Index
man_united_seasons.reset_index(drop=True, inplace=True)
# Display the dataframe
man_united_seasons.head()
| Season | Standing | |
|---|---|---|
| 0 | 2001 | 3 |
| 1 | 2002 | 1 |
| 2 | 2003 | 3 |
| 3 | 2004 | 3 |
| 4 | 2005 | 2 |
def imscatter(x, y, image, ax=None, zoom=1):
if ax is None:
ax = plt.gca()
try:
image = plt.imread(image)
except TypeError:
# Likely already an array...
pass
im = OffsetImage(image, zoom=zoom)
x, y = np.atleast_1d(x, y)
artists = []
for x0, y0 in zip(x, y):
ab = AnnotationBbox(im, (x0, y0), xycoords='data', frameon=False)
artists.append(ax.add_artist(ab))
ax.update_datalim(np.column_stack([x, y]))
ax.autoscale()
return artists
#Defining the X and Y axis
x = man_united_seasons['Season'].values.astype(int)
y = man_united_seasons['Standing']
#Extracting the crown image and plotting it on the targeted years
image_path = get_sample_data('/Users/prati/Downloads/Crown.png')
fig, ax = plt.subplots(figsize=(15, 4))
imscatter(x[5:8], y[5:8], image_path, zoom=0.1, ax=ax)
#Making the y axis inverted
plt.title('Manchester United Standing in English Premier League')
plt.xlabel('Year')
plt.ylabel('Rank')
plt.ylim(8,0)
plt.xticks(x,rotation = 45)
ax.plot(x, y)
plt.show()
Ronaldo had good communication and dynamics with his teammates at Manchester United. In 2009, he left this club and joined the Real Madrid football club. Ronaldo's transfer had a huge impact on Manchester United's dynamics and style of playing as they could not substitute any other player with his caliber of skills. His ability to convert free kicks and penalties into goals was extraordinary. He was in his prime years and attracted football lovers' attention on a global level. His leaving the club has impacted Manchester United's standings in the English premier league to decline. From the above line graph, we can see that during the years between 2006 & 2009, Manchester United won the English Premier League title three times in a row.
Later, the club could not maintain that consistency as the trend fluctuated in their standings in the league.
During the 2012-13 season, they fell to 7th position in the club's standing. In the past 20 years, this particular season was their worst performance.
The term "home-field advantage" describes the alleged inherent advantage that the side competing at home will have during the match. That team gains from not having to travel and can play in comfortable settings. On the road, though, it's a different scenario for the football teams as it could be a huge factor for fatigue.
# Load dataset for analysis
premier_league_df = pd.read_csv('Premier_League_data.csv')
# Display output printing first 5 rows
premier_league_df.head()
| Season_End_Year | Wk | Date | Home | HomeGoals | AwayGoals | Away | FTR | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1993 | 1 | 1992-08-15 | Coventry City | 2 | 1 | Middlesbrough | H |
| 1 | 1993 | 1 | 1992-08-15 | Leeds United | 2 | 1 | Wimbledon | H |
| 2 | 1993 | 1 | 1992-08-15 | Sheffield Utd | 2 | 1 | Manchester Utd | H |
| 3 | 1993 | 1 | 1992-08-15 | Crystal Palace | 3 | 3 | Blackburn | D |
| 4 | 1993 | 1 | 1992-08-15 | Arsenal | 2 | 4 | Norwich City | A |
In the dataset we have "Wk" column which denotes Week, as using Week won't play a significant role in determining the home ground winner, We will not be using this column in analysis going forward, therfore we are dropping the column.
# Dropping the columns from the dataset
premier_league_df = premier_league_df.drop('Wk', axis=1)
# Check for duplicates in the dataset
duplicates = premier_league_df.duplicated().sum()
print(f'Number of duplicates in the dataset are: {duplicates}')
#identifying the null values
null_values = premier_league_df.isnull().sum().sum()
print(f'Number of null values in the dataset are: {null_values}')
Number of duplicates in the dataset are: 0 Number of null values in the dataset are: 0
Here, we checked for duplicated values in our dataset. When we take the sum of the booleans using duplicated and isnull functions, it was found that there are no duplicates/nulls in the dataset.
We are interested in analysing how Ronaldo's performance booseted Manchester United winning probability when playing at home ground. We will be comparing the percentages between (2003-2009) and remaining seasons data for the club.
To determine whether a team has any benefit of playing at their home ground, let us calculate the percentage of home ground wins, losts, and draws:
percentage = 100
# Filtering Manchester United home games data with no Ronaldo during his first stay
mutd_home_games = premier_league_df[premier_league_df.Home == 'Manchester Utd']
mutd_no_ronaldo = mutd_home_games[(mutd_home_games['Season_End_Year'] < 2003) | (mutd_home_games['Season_End_Year'] > 2009)]
# percentage win in home games
pct_home_wins = ( len(mutd_no_ronaldo[mutd_no_ronaldo['FTR'] == 'H']) / len(mutd_no_ronaldo) ) * percentage
pct_home_wins
# percentage lost in home games
pct_home_loss = ( len(mutd_no_ronaldo[mutd_no_ronaldo['FTR'] == 'A']) / len(mutd_no_ronaldo) ) * percentage
pct_home_loss
# percentage draw in home games
pct_home_draw = ( len(mutd_no_ronaldo[mutd_no_ronaldo['FTR'] == 'D']) / len(mutd_no_ronaldo) ) * percentage
pct_home_draw
# list for pie chart plot
values_unique = ['Win','Lost','Draw']
values_no_ronaldo = [pct_home_wins,pct_home_loss,pct_home_draw]
result = ['Win','Lost','Draw']
percentages_no_ronaldo = [pct_home_wins,pct_home_loss,pct_home_draw]
# Display the output
print(f'Manchester United win percentage at home games: {pct_home_wins:0.2f}%')
print(f'Manchester United lost percentage at home games: {pct_home_loss:0.2f}% ')
print(f'Manchester United draw percentage at home games: {pct_home_draw:0.2f}%')
Manchester United win percentage at home games: 67.95% Manchester United lost percentage at home games: 12.19% Manchester United draw percentage at home games: 19.86%
# Filtering Manchester United home games data with Ronaldo during his first stay
mutd_with_ronaldo = mutd_home_games[(mutd_home_games['Season_End_Year'] >= 2003) & (mutd_home_games['Season_End_Year'] <= 2009)]
# percentage win in home games
pct_home_wins = ( len(mutd_with_ronaldo[mutd_with_ronaldo['FTR'] == 'H']) / len(mutd_with_ronaldo) ) * percentage
pct_home_wins
# percentage lost in home games
pct_home_loss = ( len(mutd_with_ronaldo[mutd_with_ronaldo['FTR'] == 'A']) / len(mutd_with_ronaldo) ) * percentage
pct_home_loss
# percentage draw in home games
pct_home_draw = ( len(mutd_with_ronaldo[mutd_with_ronaldo['FTR'] == 'D']) / len(mutd_with_ronaldo) ) * percentage
pct_home_draw
percentages_with_ronaldo = [pct_home_wins,pct_home_loss,pct_home_draw]
# Display the output
print(f'Manchester United win percentage at home games: {pct_home_wins:0.2f}%')
print(f'Manchester United loss percentage at home games: {pct_home_loss:0.2f}% ')
print(f'Manchester United draw percentage at home games: {pct_home_draw:0.2f}%')
Manchester United win percentage at home games: 75.94% Manchester United loss percentage at home games: 7.52% Manchester United draw percentage at home games: 16.54%
#Generic function to beautify output of print statement
class color:
BOLD = '\033[1m'
#Plot Donut chart for Home Ground Wins
values = percentages_with_ronaldo
colours = ['#224676', '#B93114', '#04152B']
labels=['Winning %', 'Losing %', 'Tie %']
trace1 = {'values': values,
'labels': labels,
'marker': {'colors': colours},
'type': 'pie',
'hole': 0.6,
'title': '2003-2009',
'showlegend': True}
print(color.BOLD + '% of Home Ground Wins with Ronaldo Playing for Manchester United (2003 - 2009)')
pio.show({'data': [trace1]})
values1 = percentages_no_ronaldo
trace2 = {'values': values1,
'labels': labels,
'marker': {'colors': colours},
'type': 'pie',
'hole': 0.6,
'title': '1992-2003, 2009<',
'showlegend': True}
print(color.BOLD + '% of Home Ground Wins without Ronaldo Playing for Manchester United (Before 2003 and after 2009)')
pio.show({'data': [trace2]})
% of Home Ground Wins with Ronaldo Playing for Manchester United (2003 - 2009)
% of Home Ground Wins without Ronaldo Playing for Manchester United (Before 2003 and after 2009)
The above donut graphs illustrate that the winning probability on home ground was higher for Manchester United when Ronaldo was playing at the club (more than 75%). For seasons where Ronaldo did not play, the win probability for the club is approximately 68%.
On the other hand, losing probability also decreased for Manchester United's club when Ronaldo was playing. There is a decline from 12.2% to 7.52%, a good sign for the team.
It is evident that Ronaldo boosted the club's success in Premier League. It is because during his stay, win probability increased, and the loss percentage decreased for matches played at home ground.
# Load datasets for analysis
# Creating a dataframe for Ronaldo's career goals
df = pd.read_csv("data.csv")
df_o = pd.read_csv("overall.csv")
# Create a new dataframe with only Club and Year columns
data = df[[ 'Club', 'Year']].value_counts()
season_df = pd.DataFrame(data)
season_df = season_df.reset_index(level=[0,1])
# Add .png to club name for easy parsing while visualizing
season_df['path'] =season_df['Club'] + '.png'
# Check for duplicates in the dataset
duplicates = season_df.duplicated().sum()
print(f'Number of duplicates in the dataframe are: {duplicates}')
#identifying the null values
null_values = season_df.isnull().sum().sum()
print(f'Number of null values in the dataframe are: {null_values}')
# Rename column from 0 to Count
season_df.rename(columns={0:'Count'}, inplace=True)
Number of duplicates in the dataframe are: 0 Number of null values in the dataframe are: 0
# Sort dataframe according to Year
season_df.sort_values("Year", inplace=True)
season_df.head()
| Club | Year | Count | path | |
|---|---|---|---|---|
| 19 | Sporting CP | 2003 | 5 | Sporting CP.png |
| 18 | Manchester United | 2004 | 6 | Manchester United.png |
| 17 | Manchester United | 2005 | 9 | Manchester United.png |
| 16 | Manchester United | 2006 | 12 | Manchester United.png |
| 15 | Manchester United | 2007 | 23 | Manchester United.png |
# Find all unique values in the columns
pd.DataFrame(df.apply(lambda col: len(col.unique())),columns=["Unique Values Count"])
| Unique Values Count | |
|---|---|
| Season | 21 |
| Competition | 16 |
| Matchday | 52 |
| Year | 21 |
| Date | 464 |
| Venue | 2 |
| Club | 4 |
| Opponent | 125 |
| Result | 51 |
| Playing_Position | 6 |
| Minute | 106 |
| At_score | 35 |
| Type | 12 |
| Goal_assist | 87 |
# generic stats description
df.describe(include=['object']).T
| count | unique | top | freq | |
|---|---|---|---|---|
| Season | 701 | 21 | 14/15 | 61 |
| Competition | 701 | 16 | LaLiga | 311 |
| Matchday | 701 | 52 | Group Stage | 75 |
| Date | 701 | 464 | 9/12/15 | 5 |
| Venue | 701 | 2 | H | 403 |
| Club | 701 | 4 | Real Madrid | 450 |
| Opponent | 701 | 125 | Sevilla FC | 27 |
| Result | 701 | 51 | 3:00 | 50 |
| Playing_Position | 643 | 5 | LW | 356 |
| Minute | 701 | 106 | 90 | 17 |
| At_score | 701 | 35 | 1:00 | 111 |
| Type | 686 | 11 | Right-footed shot | 251 |
| Goal_assist | 459 | 86 | Karim Benzema | 44 |
# Method to return image to be displayed on graph
def getImage(path):
return OffsetImage(plt.imread(path), zoom=.07, alpha = 1)
#Extact year data to be displayed on x-axis
x_axis = season_df['Year'].values
#Display graph
fig, ax = plt.subplots(figsize=(12, 4), dpi=150)
plt.title('Goals per season')
plt.xlabel('Year')
plt.ylabel('Number of Goals')
plt.xticks(x_axis)
ax.scatter(x_axis, season_df['Count'])
ax.plot(x_axis, season_df['Count'])
print('------------------------')
print('Teams Ronaldo Played for')
print('------------------------')
display(Image('Real Madrid.png', width=30))
print('Real Madrid')
display(Image('Manchester United.png', width=30))
print('Manchester United')
display(Image('Sporting CP.png', width=30))
print('Sporting CP')
display(Image('Juventus FC.png', width=30))
print('Juventus FC')
#Iterate through rows and add image to graph
for index, row in season_df.iterrows():
ab = AnnotationBbox(getImage(row['path']), (row['Year'], row['Count']), frameon=False)
ax.add_artist(ab)
------------------------ Teams Ronaldo Played for ------------------------
Real Madrid
Manchester United
Sporting CP
Juventus FC
The above graph shows the number of goals Ronaldo has scored in each season of his career.
Ronaldo has scored more than 40 goals in every La Liga season from 2011 to 2017. We have seen that he helped in Manchester United's club success based on its standings and winning probability at home games metric. The trend of his career goals further proves his impact and contribution to a football team.
One of the most memorable moments in CR7's career internationally is on his hat trick against Spain in 2018's World Cup and rescue a point for Portugal. Enjoy the video just below!
Let us determine if there are any patterns to assess Ronaldo's style of playing, where exactly on the football pitch(center, near the box, or outside the box) has he hit more goals. We will consider 2017-18 season where he played for Real Madrid football club:
Data required for the analysis:
Spain_matches.json: LaLiga season 2017-18 matches.
We will gain insights based on this particular season.
spain_events.json: Event name, Positions from which goal was scored, event time
#Read La liga - Spain matches data
json_file = open('Spain_matches.json')
laliga_matches_data = json.load(json_file)
#Assign team id to variable
real_madrid_team_id = '675'
#Get Real madrid teams match data
real_madrid_matches = [data for data in laliga_matches_data if real_madrid_team_id in data['teamsData'].keys()]
#Store Real madrid team Data in Dataframe
real_madrid_matches_df = pd.DataFrame(real_madrid_matches)
real_madrid_matches_df.head()
| status | roundId | gameweek | teamsData | seasonId | dateutc | winner | venue | wyId | label | date | referees | duration | competitionId | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Played | 4406122 | 38 | {'675': {'scoreET': 0, 'coachId': 275283, 'sid... | 181144 | 2018-05-19 18:45:00 | 0 | Estadio de la Cer\u00e1mica | 2565927 | Villarreal - Real Madrid, 2 - 2 | May 19, 2018 at 8:45:00 PM GMT+2 | [{'refereeId': 395085, 'role': 'referee'}, {'r... | Regular | 795 |
| 1 | Played | 4406122 | 37 | {'692': {'scoreET': 0, 'coachId': 3880, 'side'... | 181144 | 2018-05-12 18:45:00 | 675 | Estadio Santiago Bernab\u00e9u | 2565912 | Real Madrid - Celta de Vigo, 6 - 0 | May 12, 2018 at 8:45:00 PM GMT+2 | [{'refereeId': 398923, 'role': 'referee'}, {'r... | Regular | 795 |
| 2 | Played | 4406122 | 34 | {'675': {'scoreET': 0, 'coachId': 275283, 'sid... | 181144 | 2018-05-09 19:30:00 | 680 | Estadio Ram\u00f3n S\u00e1nchez Pizju\u00e1n | 2565882 | Sevilla - Real Madrid, 3 - 2 | May 9, 2018 at 9:30:00 PM GMT+2 | [{'refereeId': 384946, 'role': 'referee'}, {'r... | Regular | 795 |
| 3 | Played | 4406122 | 36 | {'675': {'scoreET': 0, 'coachId': 275283, 'sid... | 181144 | 2018-05-06 18:45:00 | 0 | Camp Nou | 2565907 | Barcelona - Real Madrid, 2 - 2 | May 6, 2018 at 8:45:00 PM GMT+2 | [{'refereeId': 378950, 'role': 'referee'}, {'r... | Regular | 795 |
| 4 | Played | 4406122 | 35 | {'675': {'scoreET': 0, 'coachId': 275283, 'sid... | 181144 | 2018-04-28 16:30:00 | 675 | Estadio Santiago Bernab\u00e9u | 2565891 | Real Madrid - Legan\u00e9s, 2 - 1 | April 28, 2018 at 6:30:00 PM GMT+2 | [{'refereeId': 385473, 'role': 'referee'}, {'r... | Regular | 795 |
#Read Event data from spain_events json file
json_file = open('spain_events.json')
laliga_events_data = json.load(json_file)
#Store Real madrid team Data in Dataframe
laliga_events_df = pd.DataFrame(laliga_events_data)
laliga_events_df.head()
| eventId | subEventName | tags | playerId | positions | matchId | eventName | teamId | matchPeriod | eventSec | subEventId | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 8 | Simple pass | [{'id': 1801}] | 3542 | [{'y': 61, 'x': 37}, {'y': 50, 'x': 50}] | 2565548 | Pass | 682 | 1H | 2.994582 | 85 | 180864419 |
| 1 | 8 | Simple pass | [{'id': 1801}] | 274435 | [{'y': 50, 'x': 50}, {'y': 30, 'x': 45}] | 2565548 | Pass | 682 | 1H | 3.137020 | 85 | 180864418 |
| 2 | 8 | Simple pass | [{'id': 1801}] | 364860 | [{'y': 30, 'x': 45}, {'y': 12, 'x': 38}] | 2565548 | Pass | 682 | 1H | 6.709668 | 85 | 180864420 |
| 3 | 8 | Simple pass | [{'id': 1801}] | 3534 | [{'y': 12, 'x': 38}, {'y': 69, 'x': 32}] | 2565548 | Pass | 682 | 1H | 8.805497 | 85 | 180864421 |
| 4 | 8 | Simple pass | [{'id': 1801}] | 3695 | [{'y': 69, 'x': 32}, {'y': 37, 'x': 31}] | 2565548 | Pass | 682 | 1H | 14.047492 | 85 | 180864422 |
Each player has a Player ID associated with it. Let us create a new dataframe for Ronaldo whose ID is 3322 for our analysis:
#Get event records for Ronaldo (where Player ID = 3322)
ronaldo_events_data_df = laliga_events_df.loc[laliga_events_df['playerId'] == 3322]
101: Goal
301: Assist
#Method for adding additional columns to distinguish gaol, assists, left foot/right foot goals
def add_columns(tags, tag_id):
return tag_id in [tag['id'] for tag in tags]
#Distinguish gao and assists - store boolean value
ronaldo_events_data_df['Goal'] = ronaldo_events_data_df['tags'].apply(lambda x: add_columns(x, 101))
ronaldo_events_data_df['Assists'] = ronaldo_events_data_df['tags'].apply(lambda x: add_columns(x, 301))
# Display the output
ronaldo_events_data_df.head()
| eventId | subEventName | tags | playerId | positions | matchId | eventName | teamId | matchPeriod | eventSec | subEventId | id | Goal | Assists | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 76412 | 1 | Ground attacking duel | [{'id': 501}, {'id': 703}, {'id': 1801}] | 3322 | [{'y': 26, 'x': 96}, {'y': 27, 'x': 91}] | 2565596 | Duel | 675 | 1H | 28.108732 | 11 | 189337977 | False | False |
| 76414 | 10 | Shot | [{'id': 402}, {'id': 2101}, {'id': 201}, {'id'... | 3322 | [{'y': 27, 'x': 91}, {'y': 0, 'x': 0}] | 2565596 | Shot | 675 | 1H | 31.052085 | 100 | 189337978 | False | False |
| 76457 | 8 | Simple pass | [{'id': 1801}] | 3322 | [{'y': 53, 'x': 68}, {'y': 67, 'x': 53}] | 2565596 | Pass | 675 | 1H | 146.902499 | 85 | 189338004 | False | False |
| 76589 | 10 | Shot | [{'id': 402}, {'id': 201}, {'id': 1201}, {'id'... | 3322 | [{'y': 48, 'x': 96}, {'y': 0, 'x': 0}] | 2565596 | Shot | 675 | 1H | 548.744061 | 100 | 189338889 | False | False |
| 76654 | 1 | Air duel | [{'id': 702}, {'id': 1801}] | 3322 | [{'y': 84, 'x': 62}, {'y': 81, 'x': 42}] | 2565596 | Duel | 675 | 1H | 713.899672 | 10 | 189338224 | False | False |
#Adding match information to the events DataFrame
ronaldo_events_data_df = pd.merge(ronaldo_events_data_df, real_madrid_matches_df, left_on='matchId', right_on='wyId', how="left")
ronaldo_events_data_df.head(2)
| eventId | subEventName | tags | playerId | positions | matchId | eventName | teamId | matchPeriod | eventSec | ... | seasonId | dateutc | winner | venue | wyId | label | date | referees | duration | competitionId | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Ground attacking duel | [{'id': 501}, {'id': 703}, {'id': 1801}] | 3322 | [{'y': 26, 'x': 96}, {'y': 27, 'x': 91}] | 2565596 | Duel | 675 | 1H | 28.108732 | ... | 181144 | 2017-09-20 20:00:00 | 684 | Estadio Santiago Bernab\u00e9u | 2565596 | Real Madrid - Real Betis, 0 - 1 | September 20, 2017 at 10:00:00 PM GMT+2 | [{'refereeId': 384946, 'role': 'referee'}, {'r... | Regular | 795 |
| 1 | 10 | Shot | [{'id': 402}, {'id': 2101}, {'id': 201}, {'id'... | 3322 | [{'y': 27, 'x': 91}, {'y': 0, 'x': 0}] | 2565596 | Shot | 675 | 1H | 31.052085 | ... | 181144 | 2017-09-20 20:00:00 | 684 | Estadio Santiago Bernab\u00e9u | 2565596 | Real Madrid - Real Betis, 0 - 1 | September 20, 2017 at 10:00:00 PM GMT+2 | [{'refereeId': 384946, 'role': 'referee'}, {'r... | Regular | 795 |
2 rows × 28 columns
#Calculate number of goals scored
goals = [ronaldo_events_data_df['Goal'].sum()]
#Calculate number of assists
assists = [ronaldo_events_data_df['Assists'].sum()]
#Calculate number of shots attempted
shots_attempted = [ronaldo_events_data_df[ronaldo_events_data_df['eventName'] == 'Shot'].count()['eventName']]
statistics = pd.DataFrame([goals, assists, shots_attempted],
columns=['Ronaldo'],
index=['Goal', 'Assists', 'Shots'])
print('-----Ronaldos Goals, Assists Recieved and Shots Attemped-----')
statistics.head()
-----Ronaldos Goals, Assists Recieved and Shots Attemped-----
| Ronaldo | |
|---|---|
| Goal | 26 |
| Assists | 5 |
| Shots | 151 |
#Method to draw football pitch
#initializing football pitch variables
width = 700
height = 350
width_pitch = 104
height_pitch = 68
color = 'green'
line_color = 'white'
grey_color = '#808080'
def draw_football_pitch():
#Create Figure for plotting
pitch = figure(width = width, height = height, toolbar_location="right")
#Draw outline for empty pitch - football ground
pitch.rect(x=width_pitch/2., y=height_pitch/2., width=width_pitch, height=height_pitch, fill_color=color, line_width=2, line_color=line_color)
#Drawe left penalty area
pitch.circle(16.5, height_pitch/2., size=50, fill_color=color, line_width=2, line_color=line_color)
#Draw Bigger rectangle
pitch.rect(x=16.5/2., y=height_pitch/2., width=16.5, height=40.3, fill_color=color, line_width=2, line_color=line_color)
#Draw Smaller rectangle
pitch.rect(x=5.5/2., y=height_pitch/2., width=5.5, height=18.3, fill_color=color, line_width=2, line_color=line_color)
#Draw Goal post
pitch.rect(x=0, y=height_pitch/2., width=0.5, height=7.3, fill_color=color, line_width=2, line_color=line_color)
#Draw Penalty spot
pitch.circle(11, height_pitch/2., size=2, fill_color=line_color, line_width=2, line_color=line_color)
#Draw right penalty area
pitch.circle((width_pitch-16.5), height_pitch/2., size=50, fill_color=color, line_width=2, line_color=line_color)
pitch.rect(x=width_pitch-(16.5/2.), y=height_pitch/2., width=16.5, height=40.3, fill_color=color, line_width=2, line_color=line_color)
#Draw Smaller rectangle
pitch.rect(x=width_pitch-(5.5/2.), y=height_pitch/2., width=5.5, height=18.3, fill_color=color, line_width=2, line_color=line_color)
#Draw Goal post
pitch.rect(x=width_pitch, y=height_pitch/2., width=0.5, height=7.3, fill_color=line_color, line_width=2, line_color=line_color)
#Draw Penalty spot
pitch.circle((width_pitch-11), height_pitch/2., size=2, fill_color=line_color, line_width=2, line_color=line_color)
#Draw middle of pitch
pitch.circle(width_pitch/2.0, y=height_pitch/2.0, size=100, fill_color=color, line_width=2, line_color=line_color)
pitch.circle(width_pitch/2.0, y=height_pitch/2.0, size=2, fill_color=line_color, line_width=2, line_color=line_color)
pitch.line([width_pitch/2.0, width_pitch/2.0], [0, height_pitch], line_width=2, line_color=line_color)
return pitch
#Plot positions from where goals were attempted
def plot_position_data(player_data, action_name, color_of_plot):
x_axis = [(player_data[0]['x']*105)/100. for player_data in player_data]
y_axis = [(player_data[0]['y']*69)/100. for player_data in player_data]
pitch = draw_football_pitch()
pitch.circle(x_axis, y_axis, fill_color=color_of_plot, line_width=1, line_color="blue", fill_alpha=0.2, size=8)
player_statistics = bokeh.models.Label(x=90, y=280,x_units='screen', y_units='screen', text=str(len(x_axis)) + " " + action_name, text_font_size= '20px', render_mode='css', text_color = 'white')
pitch.add_layout(player_statistics)
return pitch
#Extract positions from columns from which goals were shot
ronaldo_goals = ronaldo_events_data_df[ronaldo_events_data_df['Goal'] == True]['positions']
#Extract positions from where shots where attempted
shots_data = ronaldo_events_data_df[ronaldo_events_data_df['eventName'] == 'Shot']
ronaldo_shots = shots_data['positions']
#Plot ronaldo goals
plot_goals = plot_position_data(ronaldo_goals, 'Goals', 'grey')
#Plot shots
plot_shots = plot_position_data(ronaldo_shots, 'Shots', 'grey')
#Generic function to beautify output of print statement
class color:
BOLD = '\033[1m'
#Visualize shots
print(color.BOLD + 'Positions from which Ronaldo attempted a Goal')
print(color.BOLD + '------------------------------------------')
show(plot_shots)
Positions from which Ronaldo attempted a Goal ------------------------------------------
From the visualization of the football field pitch, we can observe that Ronaldo Attempted 151 Goals . Positions from which Ronaldo attempted a goal have been plotted which Blue circles.
Let us now analyze how many of these shots attempted are converted into goals:
#Visualize Goals
display(Image('Ronaldo.jpeg', width=100))
print(color.BOLD + 'Positions from which Ronaldo scored a Goal')
print(color.BOLD + '------------------------------------------')
show(plot_goals)
Positions from which Ronaldo scored a Goal ------------------------------------------
From the above plots, we can see that all of the Ronaldo's goals in 2017-18 season for Real Madrid football club were from inside the box area.
He converted 26 of attempted shots into goals depicting how lethal he is once he is inside the penalty box.
Our data backs up the fact that Cristiano Ronaldo is hands down, one of the best players in the history of football. He has been a huge part of the successes of the clubs he's played for.
Kaggle- https://www.kaggle.com/
Reserach Paper - A public data set of spatio-temporal match events in soccer competitions https://www.nature.com/articles/s41597-019-0247-7
Wikipedia - https://en.wikipedia.org/wiki/List_of_Manchester_United_F.C._seasons
Images - https://www.google.com/